372 research outputs found

    Understanding High Dimensional Spaces through Visual Means Employing Multidimensional Projections

    Full text link
    Data visualisation helps understanding data represented by multiple variables, also called features, stored in a large matrix where individuals are stored in lines and variable values in columns. These data structures are frequently called multidimensional spaces.In this paper, we illustrate ways of employing the visual results of multidimensional projection algorithms to understand and fine-tune the parameters of their mathematical framework. Some of the common mathematical common to these approaches are Laplacian matrices, Euclidian distance, Cosine distance, and statistical methods such as Kullback-Leibler divergence, employed to fit probability distributions and reduce dimensions. Two of the relevant algorithms in the data visualisation field are t-distributed stochastic neighbourhood embedding (t-SNE) and Least-Square Projection (LSP). These algorithms can be used to understand several ranges of mathematical functions including their impact on datasets. In this article, mathematical parameters of underlying techniques such as Principal Component Analysis (PCA) behind t-SNE and mesh reconstruction methods behind LSP are adjusted to reflect the properties afforded by the mathematical formulation. The results, supported by illustrative methods of the processes of LSP and t-SNE, are meant to inspire students in understanding the mathematics behind such methods, in order to apply them in effective data analysis tasks in multiple applications

    LDPP at the FinNLP-2022 ERAI task: Determinantal point processes and variational auto-encoders for identifying high-quality opinions from a pool of social media posts

    Get PDF
    Social media and online forums have made it easier for people to share their views and opinions on various topics in society. In this paper, we focus on posts discussing investment related topics. When it comes to investment , people can now easily share their opinions about online traded items and also provide rationales to support their arguments on social media. However, there are millions of posts to read with potential of having some posts from amateur investors or completely unrelated posts. Identifying the most important posts that could lead to higher maximal potential profit (MPP) and lower maximal loss for investment is not a trivial task. In this paper, propose to use determinantal point processes and variational autoencoders to identify high quality posts from the given rationales. Experimental results suggest that our method mines quality posts compared to random selection and also latent variable modeling improves improves the quality of selected posts

    Bayes at FigLang 2022 Euphemism detection shared task: Cost-sensitive Bayesian fine-tuning and Venn-Abers predictors for robust training under class skewed distributions

    Get PDF
    Transformers have achieved a state of the art performance across most natural language processing tasks. However the performance of these models degrade when being trained on skewed class distributions (class imbalance) because training tends to be biased towards head classes with most of the data points . Classical methods that have been proposed to handle this problem (re-sampling and re-weighting) often suffer from unstable performance, poor applicability and poor calibration. In this paper, we propose to use Bayesian methods and Venn-Abers predictors for well calibrated and robust training against class imbalance. Our proposed approach improves f1-score of the baseline RoBERTa (A Robustly Optimized Bidirectional Embedding from Transformers Pretraining Approach) model by about 6 points (79.0% against 72.6%) when training with class imbalanced data

    Disadvantaged by degrees? How Widening Participation students are not only hindered in accessing HE, but also during – and after – university.

    Get PDF
    There is no shortage of literature addressing the range of reasons why more disadvantaged groups are underrepresented in higher education – and particularly elite universities – in the UK, and it is clear that this has little to do with any real deficiency in terms of ability. This paper begins with an overview of this issue but then extends the argument beyond widening participation at the point of access. It raises concerns emerging from two relatively underresearched areas in the literature which indicate that ‘widening participation’ – WP – students are faced with greater inequalities than their more affluent peers both during their undergraduate degrees as well as beyond them. Although the focus here is on the UK, this topic and many of its themes will be familiar to educationalists and HE practitioners in other countries

    GGNN@Causal News Corpus 2022: Gated graph neural networks for causal event classification from social-political news articles

    Get PDF
    The discovery of causality mentions from text is a core cognitive concept and appears in many natural language processing (NLP) applications. In this paper, we study the task of Event Causality Identification (ECI) from social-political news. The aim of the task is to detect causal relationships between event mention pairs in text. Although deep learning models have recently achieved a state-of-the-art performance on many tasks and applications in NLP, most of them still fail to capture rich semantic and syntactic structures within sentences which is key for causality classification. We present a solution for causal event detection from social-political news that captures semantic and syntactic information based on gated graph neural networks (GGNN) and contextualized language embeddings. Experimental results show that our proposed method outperforms the baseline model (BERT (Bidirectional Embeddings from Transformers) in terms of f1-score and accuracy

    UCCNLP@SMM4H’22:Label distribution aware long-tailed learning with post-hoc posterior calibration applied to text classification

    Get PDF
    The paper describes our submissions for the Social Media Mining for Health (SMM4H) workshop 2022 shared tasks. We participated in 2 tasks: (1) classification of adverse drug events (ADE) mentions in english tweets (Task-1a) and (2) classification of self-reported intimate partner violence (IPV) on twitter (Task 7). We proposed an approach that uses RoBERTa (A Robustly Optimized BERT Pretraining Approach) fine-tuned with a label distribution-aware margin loss function and post-hoc posterior calibration for robust inference against class imbalance. We achieved a 4% and 1 % increase in performance on IPV and ADE respectively when compared with the traditional fine-tuning strategy with unweighted cross-entropy loss

    UNLPSat TextGraphs-16 Natural Language Premise Selection task: Unsupervised Natural Language Premise Selection in mathematical text using sentence-MPNet

    Get PDF
    This paper describes our system for the submission to the TextGraphs 2022 shared task at COLING 2022: Natural Language Premise Selection (NLPS) from mathematical texts. The task of NLPS is about selecting mathematical statements called premises in a knowledge base written in natural language and mathematical formulae that are most likely to be used to prove a particular mathematical proof. We formulated this task as an unsupervised semantic similarity task by first obtaining contextualized embeddings of both the premises and mathematical proofs using sentence transformers. We then obtained the cosine similarity between the embeddings of premises and proofs and then selected premises with the highest cosine scores as the most probable. Our system improves over the baseline system that uses bag of words models based on term frequency inverse document frequency in terms of mean average precision (MAP) by about 23.5% (0.1516 versus 0.1228)

    Perspective Chapter: Understanding Thermal Maturity Evolution and Hydrocarbon Cracking – Implication for Cretaceous Awgu and Nkporo Shales, Southeastern Nigeria

    Get PDF
    One-dimensional basin modeling was carried out using Schlumberger’s PetroMod modeling software that provided understanding on the thermal evolution, timing of hydrocarbon generation and expulsion of the Coniacian Awgu Shale and the Campanian Nkporo Shale penetrated by Nzam-1 and Akukwa-2 wells in the lower Benue Trough, Nigeria. The burial temperature and vitrinite reflectance values ranged from 30 to 145°C and 0.5 to 2.9%Ro for Awgu Formation, 28 to 125°C and 0.5 to 1.5%Ro for Nkporo Formation in Nzam-1 well model; 29.5 to 145°C and 0.8 to 2.4%Ro for Awgu Formation, and 28.5 to 95°C and 0.6 to 0.8%Ro for Nkporo Formation in Akukwa-2 well model. Awgu Shale reached the required threshold of the oil generation window during mid Campanian (75Ma) and late Santonian (82Ma) in Nzam-1 and Akukwa-2 well models, respectively. Nkporo Shale entered the required oil window threshold during early Paleocene (65Ma) in Nzam-1 well model and late Maastrichtian (67Ma) in Akukwa-2 well model. This study revealed that valid petroleum system elements exist in Anambra basin, and some amount of gaseous hydrocarbons and little oil may have been generated and expelled. Exponential decrease in temperature over time has favored the preservation of the gas reservoirs and the survival of hydrocarbons in the deep strata. The early maturity of Nkporo Shale can be attributed to lack of the requisite burial depth, temperature and pressure in favor of oil generation and expulsion. Post-maturity status of Awgu Shales may be associated with deeper burial depth and possibly due to the effect of Santonian tectonic episode
    • …